The behavioral signature of stepwise learning strategy in male rats and its neural correlate in the basal forebrain

Studies of associative learning have commonly focused on how rewarding outcomes are predicted by either sensory stimuli or animals’ actions. However, in many learning scenarios, reward delivery requires the occurrence of both sensory stimuli and animals’ actions in a specific order, in the form of behavioral sequences. How such behavioral sequences are learned is much less understood. Here we provide behavioral and neurophysiological evidence to show that behavioral sequences are learned using a stepwise strategy. In male rats learning a new association, learning started from the behavioral event closest to the reward and sequentially incorporated earlier events. This led to the sequential refinement of reward-seeking behaviors, which was characterized by the stepwise elimination of ineffective and non-rewarded behavioral sequences. At the neuronal level, this stepwise learning process was mirrored by the sequential emergence of basal forebrain neuronal responses toward each event, which quantitatively conveyed a reward prediction error signal and promoted reward-seeking behaviors. Together, these behavioral and neural signatures revealed how behavioral sequences were learned in discrete steps and when each learning step took place.


Figure S2. Identification of BF bursting neurons and their responses to behavioral events
Responses of individual BF bursting neurons (n=1013) and other BF neurons (n=440) to behavioral events (stimulus onset and reward) in the S left sound trials (two left columns) and in the S right light trials (two right columns). Recordings from different sessions (N=45 sessions; separated by thin red lines) and different animals (N=7 rats; separated by cyan dotted lines) were separated by horizontal lines. Lower panels showed the average response (thick lines) pooled across individual animals (thin lines). BF activities in catch trials (thick black lines) were plotted for comparison. Responses to the stimulus onset event were calculated based on all trials in that session, regardless of subsequent behavioral responses (licking or not). On the other hand, responses to the reward were calculated based only on correct licking trials. Conventions as in Figure 3b-c.
BF bursting neurons were defined as BF single units whose average firing rates during the [0.05, 0.2]s window after stimulus onset (yellow shaded interval) increased by more than 2 spikes/s in the S left sound trials compared to the corresponding window in catch trials. This contrast between sound trials and catch trials was necessary because many BF neurons changed their activity during the foreperiod while waiting for stimulus onset. In addition, BF bursting neurons should have baseline firing rates less than 10 spikes/s. The activities of BF bursting neurons in S left sound trials were highly similar across sessions and across animals (Figure 3c-d).
Note that the calculation of these PSTHs to the stimulus onset event (as well as those in Figure  3b-d) did not exclude spikes that occurred after fixation port exit (as in Figures 5-7). The truncation procedure used in Figures 5-7 was needed to disambiguate BF responses to light onset from the increased BF activities after fixation port exit (i.e. evaluation responses).

Figure S3
: Behavioral and BF response patterns in sound trials during the D1 session a, Behavioral response patterns in sound trials, catch trials and light trials during the D1 session, aligned at the behavioral transition point identified in Figure 4. All 7 animals were included in this analysis. Since the three trial types were presented at equal probabilities, the number of trials in each trial type was roughly the same. The abrupt behavioral transition also led to increased error rates (rightward licks) in sound trials, from 2.4±5.5% (mean±std, N=7 rats) in pre-transition trials to 12.0±8.4% in post-transition trials. While this increase in error rates was substantial, behavioral performance remained correct in the vast majority of post-transition sound trials in the D1 session. For comparison, the overall error rate in sound trials across sessions was 6.4±3.3% (mean±std, N=7 rats). b, Single trial BF responses in sound trials in the example D1 session from Figure 3e2, aligned at stimulus onset and outcome. BF activities at the two epochs were calculated at [0.05, 0.2]s after sound onset and [0.05, 0.35]s after 3rd lick. Sound trials with incorrect rightward licks were highlighted. Notice that BF responses to sound onset and reward in correct (leftward licking) trials remained fairly stable throughout the D1 session. However, BF activities in error sound trials were inhibited prior to receiving the outcome. c, Average BF activities aligned at the three behavioral events, plotted separately for pre-transition correct trials, post-transition correct trials, and post-transition error trials.
Horizontal lines indicate significant differences in population BF activities (p<0.01 for 3 consecutive steps using two-sided paired t-test for each 100 ms sliding window and 10ms step). BF activity in error sound trials was significantly inhibited compared to post-transition correct trials, in the epochs prior to receiving the outcome. Task symbols in 3a and 3c were adapted from Avila I, Lin SC (2014)    To determine the timing for the elimination of catch licks, we calculated the difference between the cumulative sums of the rightward licking numbers (normalized by the number of each trial type in the session) between light and catch trials. The lack of behavioral response differences between light and catch trials would result in a zero-slope line. Thus, the timing for the emergence of BF or behavioral responses can be visualized as the divergence point of cumulative sum traces away from the horizontal line. In this example session, BF responses to the new light emerged earlier than the elimination of catch licks, indicated respectively by the red and blue arrow. b, Each panel represents cumulative sums from one animal (each row) and one session (each column, relative to the D2 session). To account for individual variabilities in BF response amplitudes across animals, the cumulative sums of BF responses were scaled relative to the maximum of the cumulative sum in the D2+2 session in each animal. The increasing slopes of the cumulative sums between D2 to D2+2 sessions reflected the increasing response amplitudes to the new light (as shown in Figure 7a).
At the neuronal level, the lack of BF responses to the new light, which was evident in the D2-1 sessions, resulted in zero-slope cumulative sums. On the other hand, the presence of BF responses to the new light resulted in cumulative sums with positive slopes, which first emerged in D2 sessions (red arrows) and remained positive afterwards. At the behavioral level, the lack of behavioral response differences between light and catch trials, which was evident in the D2-1 sessions, resulted in zero-slope cumulative sums. On the other hand, the elimination of catch licks resulted in cumulative sums with positive slopes, which emerged between D2 to D2+2 sessions in different animals (blue arrows) and remained positive afterwards.
Note that the timing of the emergence (red and blue arrows) were visually determined based on when the cumulative sums diverged from the horizontal line (y=0). We were not able to apply a universal criterion to rigorously define this timing because of individual differences in learning dynamics. Nevertheless, the overall pattern supports that the emergence of BF responses to the new light (all in D2 sessions) occurred earlier than the elimination of catch licks (between D2 to D2+2 sessions) in all animals. c, The average BF activity (mean ± s.e.m.) in light lick and catch lick trials in the corresponding sessions (relative to the D2 session). The time window ([0.1, 0.3]s) for calculating BF responses to the new light is highlighted in yellow. Figure S6. Increased BF activity predicted reward-seeking behaviors in the absence of the light stimulus a1, Average responses of BF bursting neurons (mean ± s.e.m.) in catch trials aligned at fixation port exit, plotted separately for lick and no lick trials. a2, Distributions of AUC values from comparing BF activities in the [0, 0.5]s window after fixation port exit (yellow shaded interval in A1) between lick and no lick trials within the same trial type (catch or light trials) (two-sided AUC analysis). Within the same trial type, increased BF activities reliably predicted reward-seeking behavior toward the right reward port. Only light trials from pre-D2 sessions were included because BF responses to the light stimulus had not developed. a3, Pooled histogram of all sessions in A2 to show the proportion of statistically significant sessions in light or catch trials. b, Average responses of BF bursting neurons (mean ± s.e.m.) in catch lick trials aligned at the first lick (left). BF activities, relative to their respective baseline firing rates, were truncated at the median lick duration of the respective sessions. The timing of the last lick (mean ± s.e.m.) of the corresponding sessions were shown above each trace. BF activities prior to the start of licking in catch lick trials (yellow shaded interval, left panel) were positively correlated (Pearson correlation) with the median lick duration in individual sessions (right). Each circle indicates one session and different colors correspond to different animals. Figure S7. Stepwise learning of a two-element behavioral sequence a, The conceptual model of how a two-element sequence is learned using the stepwise strategy. a1, A model depicting the two steps in learning a two-element (A-B) behavioral sequence. Conventions as in Figure 1b. a2, In this experiment, the A element corresponded to the light stimulus, and B the approach and licking response. Only the A-B sequence was rewarded. Task symbols were adapted from Avila I, Lin SC (2014) Motivational Salience Signal in the Basal Forebrain Is Coupled with Faster and More Precise Decision Speed. PLOS Biology 12(3): e1001811. https://doi.org/10.1371/journal.pbio.1001811. a3, The model predicts that, during the first step of learning, two reward-seeking behavioral sequences -light licks and nolight licks -will be observed, and both should be accompanied by similar levels of BF activities. b, New learning task. House light served as the reward-predicting stimulus. Licking at the reward port within a 5-sec response window led to water delivery starting at the third lick. Intertrial interval (ITI) was randomly chosen between 5-14 sec. Licking in the absence of the light stimulus was not rewarded and reset the ITI counter. Three types of behaviors -light licks, light no-licks, no-light licks -were depicted in this schematic. No-light lick was defined as a licking cluster of at least 3 licks in length, with its first lick occuring outside the 5 sec response window of light onset, and at least 3 sec after the end of the last lick cluster. c, The average responses of BF bursting neurons to light onset during the first session of new learning in one example rat. 20/29 BF neurons recorded in this session were classified as BF bursting neurons based on two criteria: (1) BF activities in the [0.1, 0.3]s window after light onset increased by 2 spikes/s over baseline firing rates; (2) Baseline firing rates were within [0.1, 10] spikes/s. d, Behavioral and BF neuronal responses during the first learning session, plotted against trial sequence (xaxis) in this session (315 trials). Top panel, behavioral responses across trials. Second panel, population activities of BF bursting neurons across the same trial sequence (x-axis). Y-axis indicates time in each trial, with time zero aligned at the trial outcome (defined as the timing of the third lick). Light no-lick trials were aligned instead at the time of stimulus onset (pink triangles) such that the median timing of light onset in light lick and light no-lick trials were equivalent. The black lines to the right of the panel indicate the time windows for calculating evaluation and outcome responses. Third panel, BF evaluation responses, plotted separately for the three trial types. Evaluation response was calculated as the maximum firing rate of any 500ms window during [-1.75, 0]s before trial outcome. Circles indicate BF activities in single trials and lines indicate their respective trends (10-trial moving medians). Note that during the first quartile of trials (Q1), evaluation responses were similar between light licks and no-light licks, which became distinct in later quartiles. Fourth panel, BF outcome responses across trials, plotted separately for the two trial types with licking behaviors. Outcome response was calculated as the mean BF activity during [0.05, 0.35]s after the 3 rd lick. Outcome responses to reward delivery became weaker over trials. e, Average BF responses to light onset and reward delivery in the four trial quartiles. Notice that during Q1, there was no phasic BF response to light onset. This suggests that the increased BF evaluation responses in Q1 (third panel in d) did not result from BF responses to the light onset, and instead reflected an internal evaluation signal that was similarly present in both light licks and no-light licks. This observation supports the stepwise learning strategy (panel a), with Q1 trials corresponding to the first step of learning, during which animals engaged in two types of reward-seeking behaviors -light licks and nolight licks. Light licks in Q1 trials were not driven by the light stimulus. Learning about the light stimulus occurred in later trials when BF responses to light onset emerged, which was also when BF evaluation responses began to diverge between light licks and no-light licks.